| Train | Test | |
|---|---|---|
| INLA | 1.2267 | 1.6937 |
| INLA-RF | 0.6678 | 1.0797 |
SPDE-RF (spatial only)
Introduction
This example presents the analysis using a combination of Bayesian inference (INLA) and Random Forest (RF). It shows the result of the first loop of the algorithm for this combined data analysis. To this end, various scenarios with spatial dependence and different structures of non-linearity in the covariates are simulated.
These two scenarios are defined by non-linear effects:
- strong non-linearity scenario,
- weak non-linearity scenario.
In both scenarios, there will be a categorical variable with three levels (A, B, C), a spatial effect, and two covariates. In the strong non-linearity scenario, both covariates will exhibit a non-linear structure, whereas in the second scenario, only one of the covariates will have a non-linear relationship.
Data simulation
The model for data simulation is defined as following:
\[\mathbf{y} = \mathbf{X}\boldsymbol\beta + \sum_{k\in K} \mathbf{f}_k(\mathbf{z}_k) + \mathbf{u}_s + \boldsymbol\varepsilon\] where \(\mathbf{X}\boldsymbol\beta\) are the covariates with their linear effects: the categorical variable and the covariate with linear effect in the weak non-linearity scenario. The non-linear effects of the covariates are captured by \(\sum_{k\in K} \mathbf{f}_k(\mathbf{z}_k)\), such that \(K=1\) stands for the weak non-linearity scenario, and \(K=2\) represents the strong non-linearity scenario. Finally, \(\mathbf{u}_s\) is the spatial effect and \(\boldsymbol\varepsilon\) is the Gaussian noise of the observations. Using this structure, \(1000\) data will be simulated at random locations within the study region.
For data simulation, we will first define and simulate the components of the model in the following order
- defining the study region and the mesh for simulating the spatial effect (SPDE-FEM),
- simulating the spatial effect and the categorical variable (common to both scenarios), and
- simulating the non-linear effects for each scenario.
These components constitute the linear predictor of the model.
Defining the study region and the mesh for the spatial simulation
Simulating the spatial effect (SPDE2) and the categorical variable
Simulation of the response variable under the strong non-linearity case
Simulation of the response variable under the weak non-linearity case
Analysis of the data under the strong non-linearity case
The analysis of the simulated data will be conducted according to the two scenarios. In each scenario, the following procedure will be followed:
- Split of the data into two train/test sets (the test set is the \(20\%\) of the observations).
- Perform Bayesian inferential analysis, considering two models: simple and complex.
- The simple model assumes that the effects of the covariates are linear.
- The complex model considers a non-linear structure for the effects of the non-linear covariates.
- Compute the residuals using the mean of the posterior marginal distribution of the expectation for each data point in the training and test sets.
- Analyze the point estimates of the residuals using Random Forest (RF). Two different strategies will be followed:
- using the values of the covariates and the Cartesian coordinates of the observation locations, or
- using the mean values of the marginal posterior distributions of the non-linear effects and the spatial effect (to capture the geometry of the marginal posterior distributions).
- Compare the RMSE for the train and test sets based on the results from Bayesian inference (INLA) or from combining Bayesian inference with residual analysis using RF (INLA-RF).
Additionally, it would be possible to use the entire marginal posterior distribution of the expectation for the residuals, instead of the mean of this distribution, as a proxy for the residuals
Simple INLA model and RF combined analysis (first loop only)
| Train | Test | |
|---|---|---|
| INLA | 1.2267 | 1.6937 |
| INLA-RF | 0.6252 | 1.0619 |
Complex INLA model and RF combined analysis, sharing geometry information (first loop only)
| Train | Test | |
|---|---|---|
| INLA | 0.2765 | 0.5506 |
| INLA-RF | 0.2791 | 0.5227 |
| Train | Test | |
|---|---|---|
| INLA | 0.2765 | 0.5506 |
| INLA-RF | 0.2616 | 0.5554 |
Analysis of the data under the weak non-linearity case
Simple INLA model and RF combined analysis (first loop only)
| Train | Test | |
|---|---|---|
| INLA | 0.1758 | 0.5631 |
| INLA-RF | 0.1571 | 0.5499 |
| Train | Test | |
|---|---|---|
| INLA | 0.1758 | 0.5631 |
| INLA-RF | 0.1472 | 0.5336 |
Complex INLA model and RF combined analysis, sharing geometry information (first loop only)
| Train | Test | |
|---|---|---|
| INLA | 0.0733 | 0.4850 |
| INLA-RF | 0.0784 | 0.4829 |
| Train | Test | |
|---|---|---|
| INLA | 0.0733 | 0.4850 |
| INLA-RF | 0.0733 | 0.4878 |